Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 2(11): e159, 2006 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-17112314

RESUMO

With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.


Assuntos
Mapeamento Cromossômico/métodos , Evolução Molecular , Genoma Bacteriano/genética , Locos de Características Quantitativas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Sequência Conservada , Dados de Sequência Molecular , Fenótipo , Homologia de Sequência do Ácido Nucleico , Integração de Sistemas
2.
BMC Genomics ; 7: 257, 2006 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-17038185

RESUMO

BACKGROUND: The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular composition (i.e. genotype) and phenotype for microbes is not obvious. While there have been several studies that address this challenge, none have yet proposed a large-scale method integrating curated biological information. Here we utilize a systematic approach to discover genotype-phenotype associations that combines phenotypic information from a biomedical informatics database, GIDEON, with the molecular information contained in National Center for Biotechnology Information's Clusters of Orthologous Groups database (NCBI COGs). RESULTS: Integrating the information in the two databases, we are able to correlate the presence or absence of a given protein in a microbe with its phenotype as measured by certain morphological characteristics or survival in a particular growth media. With a 0.8 correlation score threshold, 66% of the associations found were confirmed by the literature and at a 0.9 correlation threshold, 86% were positively verified. CONCLUSION: Our results suggest possible phenotypic manifestations for proteins biochemically associated with sugar metabolism and electron transport. Moreover, we believe our approach can be extended to linking pathogenic phenotypes with functionally related proteins.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Genes Bacterianos/genética , Genótipo , Fenótipo , Reprodutibilidade dos Testes
3.
Curr Opin Struct Biol ; 14(1): 104-9, 2004 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15102456

RESUMO

Motions related to protein-protein binding events can be surveyed from the perspective of the Database of Macromolecular Movements. There are a number of alternative conceptual models that describe these events, particularly induced fit and pre-existing equilibrium. There is evidence for both alternatives from recent studies of conformational change. However, there is increasing support for the pre-existing equilibrium model, whereby proteins are found to simultaneously exist in populations of diverse conformations.


Assuntos
Modelos Moleculares , Conformação Proteica , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Regulação Alostérica/fisiologia , Ligação Proteica/fisiologia
4.
J Mol Biol ; 336(1): 115-30, 2004 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-14741208

RESUMO

Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are: (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http://mining.nesg.org/.


Assuntos
Genômica , Conformação Proteica , Proteínas/química , Proteínas/genética , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Árvores de Decisões , Sinais Direcionadores de Proteínas , Análise de Sequência de Proteína
5.
Nucleic Acids Res ; 31(11): 2833-8, 2003 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-12771210

RESUMO

We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at http://nesg.org. It serves as the central hub for the Northeast Structural Genomics Consortium, allowing collaborative structural proteomics to be carried out in a distributed fashion. The core of SPINE is a laboratory information management system (LIMS) for key bits of information related to the progress of the consortium in cloning, expressing and purifying proteins and then solving their structures by NMR or X-ray crystallography. Originally, SPINE focused on tracking constructs, but, in its current form, it is able to track target sample tubes and store detailed sample histories. The core database comprises a set of standard relational tables and a data dictionary that form an initial ontology for proteomic properties and provide a framework for large-scale data mining. Moreover, SPINE sits at the center of a federation of interoperable information resources. These can be divided into (i) local resources closely coupled with SPINE that enable it to handle less standardized information (e.g. integrated mailing and publication lists), (ii) other information resources in the NESG consortium that are inter-linked with SPINE (e.g. crystallization LIMS local to particular laboratories) and (iii) international archival resources that SPINE links to and passes on information to (e.g. TargetDB at the PDB).


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Proteômica , Comportamento Cooperativo , Sistemas de Gerenciamento de Base de Dados , Internet , Proteínas/genética , Proteínas/isolamento & purificação , Proteínas/metabolismo , Software , Integração de Sistemas
6.
J Mol Evol ; 56(1): 77-88, 2003 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-12569425

RESUMO

Homologues of barley Mlo encode the only family of seven-transmembrane (TM) proteins in plants. Their topology, subcellular localization, and sequence diversification are reminiscent of those of G-protein coupled receptors (GPCRs) from animals and fungi. We present a computational analysis of MLO family members based on 31 full-size and 3 partial sequences, which originate from several monocot species, the dicot Arabidopsis thaliana, and the moss Ceratodon purpureus. This enabled us to date the origin of the Mlo gene family back at least to the early stages of land plant evolution. The genomic organization of the corresponding genes supports a monophyletic origin of the Mlo gene family. Phylogenetic analysis revealed five clades, of which three contain both monocot and dicot members, while two indicate class-specific diversification. Analysis of the ratio of nonsynonymous-to-synonymous changes in coding sequences provided evidence for functional constraint on the evolution of the DNA sequences and purifying selection, which appears to be reduced in the first extracellular loop of 12 closely related orthologues. The 31 full-size sequences were examined for potential domain-specific intramolecular coevolution. This revealed evidence for concerted evolution of all three cytoplasmic domains with each other and the C-terminal cytoplasmic tail, suggesting interplay of all intracellular domains for MLO function.


Assuntos
Arabidopsis/genética , Evolução Molecular , Filogenia , Proteínas de Plantas/genética , Zea mays/genética , Sequência de Aminoácidos , Dados de Sequência Molecular , Alinhamento de Sequência
7.
J Mol Biol ; 324(1): 177-92, 2002 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-12421567

RESUMO

Protein-protein interactions play crucial roles in biological processes. Experimental methods have been developed to survey the proteome for interacting partners and some computational approaches have been developed to extend the impact of these experimental methods. Computational methods are routinely applied to newly discovered genes to infer protein function and plausible protein-protein interactions. Here, we develop and extend a quantitative method that identifies interacting proteins based upon the correlated behavior of the evolutionary histories of protein ligands and their receptors. We have studied six families of ligand-receptor pairs including: the syntaxin/Unc-18 family, the GPCR/G-alpha's, the TGF-beta/TGF-beta receptor system, the immunity/colicin domain collection from bacteria, the chemokine/chemokine receptors, and the VEGF/VEGF receptor family. For correlation scores above a defined threshold, we were able to find an average of 79% of all known binding partners. We then applied this method to find plausible binding partners for proteins with uncharacterized binding specificities in the syntaxin/Unc-18 protein and TGF-beta/TGF-beta receptor families. Analysis of the results shows that co-evolutionary analysis of interacting protein families can reduce the search space for identifying binding partners by not only finding binding partners for uncharacterized proteins but also recognizing potentially new binding partners for previously characterized proteins. We believe that correlated evolutionary histories provide a route to exploit the wealth of whole genome sequences and recent systematic proteomic results to extend the impact of these studies and focus experimental efforts to categorize physiologically or pathologically relevant protein-protein interactions.


Assuntos
Proteínas de Caenorhabditis elegans , Proteínas de Transporte , Evolução Molecular , Modelos Biológicos , Fosfoproteínas , Proteínas/metabolismo , Proteínas de Transporte Vesicular , Algoritmos , Quimiocinas/metabolismo , Colicinas/metabolismo , Fatores de Crescimento Endotelial/metabolismo , Proteínas de Ligação ao GTP/metabolismo , Proteínas de Helminto/metabolismo , Peptídeos e Proteínas de Sinalização Intercelular/metabolismo , Linfocinas/metabolismo , Proteínas de Membrana/metabolismo , Filogenia , Subunidades Proteicas , Proteínas Qa-SNARE , Receptores Adrenérgicos/metabolismo , Receptores de Quimiocinas/metabolismo , Receptores de Fatores de Crescimento Transformadores beta/metabolismo , Receptores de Fatores de Crescimento do Endotélio Vascular/metabolismo , Fator de Crescimento Transformador beta/metabolismo , Fator A de Crescimento do Endotélio Vascular , Fatores de Crescimento do Endotélio Vascular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA